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O ■ Abstract 

u 

-M ' Methods for Bayesian simulation in the presence of computationally intractable 

C^ ■ 

c/3 . likelihood functions are of growing interest. Termed likelihood-free samplers, 

standard simulation algorithms such as Markov chain Monte Carlo have been 
adapted for this setting. In this article, by presenting generalisations of existing 
r^ I algorithms, we demonstrate that likelihood-free samplers can be ambiguous over 



the form of the target distribution. We also consider the theoretical justification 



^D I of these samplers. Distinguishing between the forms of the target distribution 



may have implications for the future development of likelihood- free samplers. 
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1 Introduction 

Bayesian inference proceeds via the posterior distribution n{9\y) oc f{y\9)7i{9), the 
updating of prior information 7t{6) for a parameter 6 E Q through the hkehhood 
function f{y\0) after observing data y ^y. Numerical algorithms, such as importance 
sampling, Markov chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC), 
are commonly employed to draw samples from the posterior Ti{9\y). 

There is growing interest in posterior simulation in situations where the likelihood 
function is computationally intractable i.e. f{y\0) may not be numerically evaluated 
pointwise. As a result, sampling algorithms based on repeated likelihood evaluations 
require modification for this task. Collectively known as likelihood-free samplers (and 
also as approximate Bayesian computation) these methods have been developed across 
multiple disciplines and literatures. They employ generation of auxiliary datasets 
under the model as a means to circumvent (intractable) likelihood evaluation. 

In this article we present general forms of two likelihood-free models, and extend 
earlier likelihood-free samplers (based on rejection sampling and MCMC) to these 
models. In doing so, we demonstrate that likelihood-free samplers are sometimes am- 
biguous over the exact form of their target distribution: in particular whether samples 
are obtained from the joint distribution of model parameters and auxiliary datasets, 
or from the marginal distribution of model parameters only. The interpretation of the 
auxiliary datasets is quite distinct in each case: under the joint distribution target 
they play the role of auxiliary parameters, whereas under the marginal distribution 
target they are simply a means to approximate the likelihood function under Monte 
Carlo integration. It may be important for the future development of likelihood- 
free samplers to make clear the distinction between the two different forms of target 
distribution, and the interpretation of the auxiliary datasets. 

In Section [2] we establish the notation and models underlying likelihood- free meth- 
ods. In Section [3] we consider importance sampling, MCMC and SMC algorithms in 
turn, and discuss sampler validity and algorithm equivalence under both target dis- 



tributions. We conclude with a summary and discussion in Section |H 

2 Models for computationally intractable likelihoods 

In essence, likelihood-free methods first reduce the observed data, y, to a low-dimensional 
vector of summary statistics ty = T{y) G T, where dim(^) < dim(tj^) << dim{y). 
Accordingly, the true posterior ir^Oly) is replaced with a new posterior 7r{d\ty). These 
are equivalent if ty is sufficient for 6, and 7r{6\ty) ^ 7r(^|y) is an approximation if 
there is some loss of information through ty. The new target posterior, 7T{6\ty), still 
assumed to be computationally intractable, is then embedded within an augmented 
model from which sampling is viable. Specifically the joint posterior of the model 
parameters 9, and auxiliary data t E T given observed data ty is 

7T{e,t\ty) (X Knity -t)f{t\e)7r{e), (1) 

where t ~ f{t\0) niay be interpreted as the vector of summary statistics t = T{x) 
computed from a dataset simulated according to the model x ~ f{x\6). Assuming 
such simulation is possible, data-generation under the model, t ~ f{t\d), forms the 
basis of computation in the likelihood- free setting - see Section [31 The target marginal 
posterior 7rM{0\ty) for the parameters 9, is then obtained as 



TTMioity) = CM I Khity - t)f{t\e)7T{e)dt (2) 

where (ca/)"^ = Jq j^ Khity — t) f {t\6)rc{6)dtd6 normalises (j2]) such that it is a density 
in 9 (e.g. IReeves and Pettitt 20051 IWilkinson 20081 IBlum 20101 ISisson and Fan 20101 



Fernhead and Prangle 2010 ). The function Khity — t) is a standard kernel func- 
tion, with scale parameter /i > 0, which weights the intractable posterior with 
high density in regions t ^ ty where auxiliary and observed datasets are similar. 
As such, 7rjv/(^|ty) ^ 7r{6\ty) forms an approximation to the intractable posterior 
via (121) through standard smoothing arguments (e.g. IBlum 2010[) . In the case as 
h —> 0, so that Kh{ty — t) becomes a point mass at the origin (i.e. ty = t) and 



is zero elsewhere, if ty is sufficient for 6 then the intractable posterior marginal 
T^Aii^^lty) = '^{^\ty) = T^i^lv) IS recovercd exactly (although small h is usually imprac- 
tical - see Section |3]). Various choices of smoothing kernel K have been examined in 
the literature (e.g. [Marjoram et al. 2003[ IBeaumont et al. 2002} IPeters et al. 2009| 
IPeters et al. 20101 iBhmi 20101 ISisson and Fan 2010p . 

For our discussion on likelihood-free samplers, it is convenient to consider a gener- 
alisation of the joint distribution ([1]) incorporating S > 1 auxiliary summary vectors 



where t^'^ = (t^,...,t^) and t^,...,t^ ~ f(t\0) are S independent datasets gen- 
erated from the (intractable) model. As the auxiliary datasets are, by construc- 
tion, conditionally independent given 9, we have f(t^'^\9) = Y[s=i fi^^l^)- ^^ follow 
Del Moral et al. (2008)| and specify the kernel K as Khity.t^-^) = 5"^ J^Li ^hity - 
t*), which produces the joint posterior 



T^A 
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s=l 



\{fit'\o) 
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with cj > the appropriate normalisation constant, where in ([3]) we extend the 



uniform kernel choice of K{ty — t^) by Del Moral et al. (2008) to the general case. 
It is easy to see that, by construction, f^g7ij{9,t^'^\ty)dt^''^ = 7rM{9\ty) admits the 
distribution (|2]) as a marginal distribution (c.f. IDel Moral et al. 2008p . The case 

t\ty) corresponds to the more usual joint posterior ([T]) 



S = 1 with7rj(^,t^^*|t. 



vr 



■,U^Uy 



in the likelihood-free setting. 

There are two obvious approaches to posterior simulation from 7iM{9\ty) ~ n{9\ty) 
as an approximation to 'JT{9\y). The first approach proceeds by sampling directly 
on the augmented model it j {9, t^''^\ty), realising joint samples {9,t^'^) G 6 x T^ 
before a posteriori marginalisation over t^'^ (i.e. by discarding the t'^ realisations 
from the sampler output). In this approach, the summary quantities t^'^ are treated 
as parameters in the augmented model. The second approach is to sample from 



'^Aii^lty) directly, a lower dimensional space, by approximating the integral ([2]) via 
Monte Carlo integration in lieu of each posterior evaluation of vrM(6'|ty). In this case 

TTMieity) (X 7r{9) [ Kh{ty - t)f{t\e)dt ^ "^y^Khity - n := 71^(^1^,), (4) 

where t^, . . . ,t^ ~ fiA^^)- This expression, examined by various authors (e.g. Marjoram et al. 2003 
IReeves and Pettitt 20051 ISisson et al.^OOTl IRatmann et al. 20091 IToni et al. 2"009l 
IPeters et al. 2009|l . requires multiple generated datasets t^, . . . ,t^, for each evalua- 



tion of the marginal posterior distribution 7rM{d\ty)- As with standard Monte Carlo 
approximations, Var[7rA/(6'|tj^)] reduces as S increases, with lim5_^.oo Var[7rA/(^|ty)] = 0. 
For the marginal posterior distribution, the quantities t^'^ serve only as a means to 
estimate vrM(6'|ty), and do not otherwise enter the model explicitly. The number of 
samples S directly impacts on the variance of the estimation. 

We now examine the relationships between, and technical validity of, likelihood- 
free samplers constructed with 7TM{0\ty) and 7ij{9,t^'-^\ty) as the target distribution. 

3 Sampler ambiguity and validity 

In this section we examine each of the basic sampler types: rejection sampling, MCMC 
and population-based methods. We extend the first two of these algorithms to mul- 
tiple data generations {S > 1). We will examine sampler validity with respect to 
the two target posterior distributions vrA/(^|tj^) and 7rj{6,t^'^\ty), and demonstrate 
algorithm equivalence under both the joint and marginal distributional targets. 

3.1 Rejection samplers 

Rejection-based likelihood-free samplers were developed in the population genetics 
literature fITavare et al. 19971 IPritchard et al. 19991 [Marjoram et al. 2003p. Table [D 



presents a generalisation of the rejection sampling algorithm. The specific case of 



S" = 1 is the original implementation of the sampler. We now demonstrate that this 
algorithm has both TTM{0\ty) and TTj{9,t^'-^\ty) as target distributions. 

LF-REJ Algorithm 

1. Generate 6 ~ it{6) from the prior. 

2. Generate t^, . . . , t'^ ~ /(^l^) independently from the model. 

3. Accept 9 with probability proportional to ^ X^^^i K^ity — t^)- 

Table 1: The generalised likelihood- free rejection sampling (LF-REJ) algorithm. 

We first assume the joint model target vrj(6', t^'^\ty), under the LF-REJ algorithm. 
Following Table [H a sample (6', t^'-^) is first drawn from the prior predictive distribu- 
tion 7r(6',t^-"^) = niO) Y\s=i fi't'^l^) (steps 1 and 2). The acceptance probability (step 
3) for {6, t^'^) under a rejection sampler targeting ([3]) is proportional to 

as indicated in Table [1] A 'posteriori marginalisation over t^'^ G T^ (by discarding 
the t^''^ realisations) then provides draws from i^M{Q\ty)- 

If we now assume the marginal model target, 7rA/(^|tj^), a sample Q is first drawn 
from the prior (Tabled! step 1). The acceptance probability for this sample is then 
proportional to 7rjv/(6'|ty)/7r(6'), which via (jlj) is itself approximately proportional to 

s 



T^M{0\ty] 



^j2Mty-n, 



7T{e) s 

using the Monte Carlo draws t^, . . . ,t^ from the model (steps 2 and 3). Note that 
while TrM{0\ty)/Tr{O) is an approximation of the acceptance rate, it is unbiased for all 
S > 1. Thus, while smaller S will result in more variable acceptance probabilities, 
the accepted samples will still correspond to draws from iiMiOlty) for all S > 1. 
Hence, from the above we have that the LF-REJ algorithm successfully targets both 
7rj(^, t^-'^|tj^) and 7iMiO\ty), for any S > 1. 
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3.2 Markov chain Monte Carlo samplers 

MCMC-based likelihood-free samplers were introduced to avoid rejection sampling in- 
efficiencies when the posterior and prior were sufficiently different (Marjoram et al. 2003 



IBortot et al. 20071 ISisson and Fan^OTOl) . The generalised hkelihood-free MCMC al- 
gorithm for 5 > 1 is presented in Table [H Again, S = 1 with a uniform kernel, K, is 
the original implementation of this sampler. 

LF-MCMC Algorithm 

Initialise 9i (and t\'^ = {t\, . . . ,tf) with t^ ~ /(t|6'i) drawn from the model) 
At stage 77, > 1 

1. Generate 6 ~ q{On, 0) from a proposal distribution. 

2. Generate t^'^ = (t^, . . . ,t^) with t'^ ~ f{t\0) drawn independently from the model. 

3. With probability min {l, fg^^f^gS^} -cept 6^^, = 6, (t-^^ = t^^) 

otherwise set 9n+i = On, {t^^^ = t^^)- 

4. Increment n = n + 1 and go to 1. 

Table 2: The generalised likelihood-free MCMC (LF-MCMC) algorithm. Statements in 
parentheses involving t^ relate to sampler with target 'Kj[9,t^ \ty). 

The LF-MCMC sampler was introduced in the context of targeting the marginal 



posterior distribution nMid\ty). Marjoram et al. (2003) and Wegmann et al. (2009) 



(for S = 1) present variations on proofs of detailed balance under this assumption. We 
now demonstrate that for finite (i.e. practical values of) S, the LF-MCMC sampler 
is theoretically only valid under the joint posterior target 7rj{6,t^'^\ty). 

Implementing the LF-MCMC sampler assuming the marginal posterior target 
7rj\/(6'|ty), and a proposal density g(6'„, 9) for 9, the probability of accepting the move 
from 9n at time n to a proposed value 9 ~ q{9n, 9) is given by 

via (j4]). Unlike rejection sampling, where the acceptance probability is proportional 



to an unbiased estimate TtM{0\ty)/Ti{9), the above Markov chain acceptance prob- 
abihty consists of a ratio of two unbiased estimates T^MiG\iy)/T^MiGn\'ty)- As such, 
the estimate of the acceptance probabihty (involving this ratio) is biased, as in gen- 
eral E[X/F] ^ E[X]/E[F]. Only as S* -^ oo so that the bias of the ratio dimin- 
ishes, can this algorithm target the marginal posterior TiM{d\ty)- Many authors (e.g. 
Marjoram et al. 2003; IBortot et al. 2007t Wegmann et al. 2009 and others) imple- 



ment the LF-MCMC algorithm with S* = 1, which by this argument appears too 
small to result in an unbiased sampler targeting TTM{d\ty)- 

If we now consider an MCMC algorithm targeting the joint posterior 7rj(^, t^''^\ty), 
taking the specific form in Equation Q, the probability of accepting a proposed move 
from {9nit]l^) at time n to 

s 

s=l 

at time n + 1, is then 



mill < 1 — — > = rniii < 1 

1 'nj{er.,ti^s\t^)q[ier.,ti^s^,ie,t^--s)]j \ '^EsMty-tf,)n{9M9^,e) 

(6) 
This sampler correctly targets 7rj(6', t^''^\ty) by construction, and the acceptance prob- 
ability iQ is exact. 

Hence, through the equivalence of the acceptance probabilities (^ and (^, the 
auxiliary variable LF-MCMC sampler targeting 7Tj{6,t^'^\ty) results in exactly the 
same algorithm as an LF-MCMC sampler targeting TTM{6\ty), for any S* > 1. Thus, 
despite the above argument of bias in marginal samplers for finite S, implementa- 
tions of marginal LF-MCMC samplers are in practice unbiased for S* > 1, in that the 
sampler must correctly produce draws from 7rA/(6'|tj,). However, this practical unbi- 
asedness is strictly only available through that conveyed by the equivalent sampler 
targeting TTj{9,t^'-^\ty). 



3.3 Population-based samplers 

Population-based likelihood-free samplers were introduced to circumvent poor mix- 
ing in MCMC samplers f lSisson et al. 20071 IToni et al. 20091 IBeaumont et al. 20091 
IPeters et al. 2009t IDel Moral et al. 2008p . These samplers propagate a population 

of particles, 6^^\ . . . , 6^'^\ with associated importance weights iy(^^*^), through a se- 
quence of related densities 0i(^i), . . . , (pnif^n), which defines a smooth transition from 
the distribution 0i, from which direct sampling is available, to 0„ the target distri- 
bution. For likelihood-free samplers, 0^ is defined by allowing Kh^{ty — t) to place 
greater density on regions for which ty k, t as k increases (that is, the bandwidth hk 
decreases with k). Hence, we denote T^j,k{G^t^'^\iy) oc Khf,{ty,t^'^)f{i}'^\6)-n{6) and 
'^M,k{0\ty) OC 7r(6') j^s Khi^{ty,t^'-^)f{t^'-^\9)dt^'-^ for /c = 1, . . . ,n, under the joint and 
marginal posterior models respectively. 

3.3.1 Sequential Monte Carlo-based samplers 



Under the sequential Monte Carlo samplers algorithm ( JDel Moral et al. 20061) the 



particle population 6k-i drawn from the distribution (f)k-i{(^k-i) at time fc — 1 is 
mutated to (pki^k) by the kernel Mk{6k-i,0k)- The weights for the mutated particles 
9k may be obtained as Wk{Ok) = Wk-i{0i;^i)wk{0k_i,9k) where, for the marginal 
model sequence 7rM,k{0k\ty), the incremental weight is 

/^ n \ _ '^M,k{Gk\ty)Lk~l (Ok, Gk-l) ^ T^M,k{Gk\'ty)Lk-l (Ok, Gk~l) /„N 

T^M,k~l{dk-l\ty)Mk {9k-l, 9k) TlM,k-l{9k-l\ty)Mk {9k-l, Ok) 

where, following (jl]), 

T^M,k{^k\ty) ■■= —^y^^'^h^.ity - f) 
^ s=l 

is proportional to an (unbiased) estimate of irM^kidklty) based on S Monte Carlo 
draws t^, . . . , t*^ ~ f{t\6k). Here Lk_i (6*^, 6'fc_i) is a reverse-time kernel describing the 
mutation of particles from (pki^^k) at time k to (f)k~i{dk-i) at time k — 1. As with 
the LF-MCMC algorithm, the incremental weight ([7]) consists of the "biased" ratio 
7rM,k{^k\ty)/Tik-ii6M,k-i\ty) for finite S > 1. 

9 



If we now consider a sequential Monte Carlo sampler under the joint model 
7Tj^k{(^,t^'^\ty), with the natural mutation kernel factorisation 

s 

s=l 

(and similarly for Lk-i), following the form of (I7j), the incremental weight is exactly 
„, Un +^--S\ (a i^--s\\ ^ T^s^HJiy - tl)T^{Ok)Lk-i (gfc,gfc-i) 

Hence, as the incremental weights ([3, [8]) are equivalent, they induce identical SMC 
algorithms for both marginal and joint models vta/ (6'|ty) and 7rj(6', t^'^\ty). As a result, 
while applications of the marginal sampler targeting 7rM{6\y) are theoretically biased 
for finite 5 > 1, as before, they are in practice unbiased through association with the 
equivalent sampler on joint space targeting 7rj{9,t^''^\ty). 

We note that a theoretically unbiased sampler targeting nM{0\ty), for all S > 
1, can be obtained by careful choice of the kernel Lk_i{6k,0k-i)- For example. 



Peters et al. (2009)| use the suboptimal kernel fIDel Moral et al. 2006|1 

r rn a \ _ nM,k-l{Ok~l\ty) Mk{Ok~l, Ok) ,r,^ 

Lk-i[Uk,Uk^l — J — 1 X , , ,^ Z"T3Z ' ^^ 

J TiM^k^i{0k^i\ty)Mk{9k^i,9k)d9k-i 
from which the incremental weight ([7]) is approximated by 

Wk{0k-i,0k) = iiM,k{Ok\ty)/ I iiM,k-i{0k-i\ty)Mk{9k-i,0k)d6k-i 

N 

~ ^M,k{dk\ty)/Y,^k-i{eti)Mk{ef_^,ek). (lo) 

Under this choice of backward kernel, the weight calculation is now unbiased for all 
S* > 1, since the approximation TiM,k~i{d\y) in the denominator of ([7]) is no longer 
needed. 

In practice, application of SMC samplers in the likelihood-free setting requires the 



avoidance of severe particle depletion. Targeting 717(6*, t^''^|ty), Del Moral et al. (2008) 
use a standard MCMC kernel in combination with a large number of slowly changing 
distributions 'i^,j,k{Oit^'^\ty) to maintain particle diversity. In an alternative approach 
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targeting 7rM{0\ty), Peters et al. (2009) probabilistically reject particles with weight 
below a given threshold. The final form of the weight including the rejection mecha- 
nism involves the form flTOj) . and so is unbiased for all 5 > 1. 

3.3.2 Alternative population-based samplers 



Sisson et al. (2007) , Toni et al. (2009) and Beaumont et al. (2009) propose alterna- 
tive population-based likelihood-free algorithms. While deriving from different sam- 
pling frameworks, they are essentially the same sampler and utilise importance- 
sampling weights of the form (TTOl) . Following the arguments in Section 13. ![ such 
samplers successfully target both 7rj\/(6'|tj^) and 7rj(6', t^-'^|ty) for all S' > 1, and pro- 
duce identical algorithms. 

4 Discussion 

In this article, we have extended some existing likelihood-free samplers to incorpo- 
rate multiple {S > 1) auxiliary data generations, t^'^ G T'^ ■ In doing so, we have 
established an ambiguity over the target distribution of such samplers, which is prob- 
lematic from an interpretative perspective. Those algorithms targeting vrjv/ (6'|ty), and 
requiring estimates of likelihood ratios within acceptance probabilities or importance 
weights, require the number of Monte Carlo draws 5 — t- oo to avoid a theoreti- 
cal bias. Fortunately, through an equivalence with a likelihood-free sampler targeting 
7rj(0, t^'^\ty), inferences performed with the marginal posterior sampler are in practice 
unbiased. However, this practical unbiasedness does not justify the sampler targeting 
7rA/(6'|ty). Such samplers can only be theoretically justified from the perspective of the 
joint posterior nj{9,t^'^\ty) given by ([3]) (c.f. IDel Moral et al. 2008^ . Alternative rep- 
resentations of likelihood-free models (e.g. [Wilkinson 2008^ Fernhead and Prangle 2010) 
may not offer this interpretation. 

It may be important for the future development of efficient likelihood- free samplers 
to make clear the distinction between the two different forms of target distribution. 
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For instance, suppose that a future algorithm is sufficiently complicated that the usual 
joint posterior distribution strategy (Section |3]) of cancelling the intractable likelihood 
functions, f{t\6), between the target and proposal distributions is unavailable. The 
sampler must then be implemented with the marginal posterior target, nM{0\ty), via 
Monte Carlo integration. However, if this same algorithm also relies on the evaluation 
of ratios of likelihood estimates, then for finite S, this sampler may not be theoretically 
justified without further investigation. 
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